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Computing the PfafSan of a skew-symmetric matrix is a problem that arises in various fields of 
physics. Both computing the Pfaffian and a related problem, computing the canonical form of a skew- 
symmetric matrix under unitary congruence, can be solved easily once the skew-symmetric matrix 
has been reduced to skew-symmetric tridiagonal form. We develop efficient numerical methods for 
computing this tridiagonal form based on Gauss transformations, using a skew-symmetric, blocked 
form of the Parlett-Reid algorithm, or based on unitary transformations, using block Householder 
transformations and Givens rotations, that are applicable to dense and banded matrices, respectively. 
We also give a complete and fully optimized implementation of these algorithms in Fortran, and 
also provide Python, Matlab and Mathematica implementations for convenience. Finally, we apply 
these methods to compute the topological charge of a class D nanowire, and show numerically the 
equivalence of definitions based on the Hamiltonian and the scattering matrix. 

PACS numbers: 02.10.Yn, 02.60.Dc, 03.65.Vf 



I. INTRODUCTION 



A. Pfafiians and reduction to tridiagonal form 

A real or complex matrix A is called skew-symmetric (or anti- symmetric), if ^4 = — where denotes the 
transpose. The determinant det(j4) of such a skew-symmetric matrix is the square of a polynomial in the matrix 
entries, the Pfaffian Pf(^): 

det(A) = Pf{Af . (1) 
In other words, the Pfaffian of a skew-symmetric matrix is a unique choice for the sign of the root of the determinant: 



Pf (A) = ±Vdet(A) (2) 

Pfaffians arise in various fields of physics, such as in the definition of topological charges [I]-^, electronic structure 
quantum Monte Carlo ^ , the two-dimensional Ising spin glass [5 , , or in the definition of a trial wave function for the 
V ^ h/2 fractional quantum Hall state [6]. It also arises naturally from Gaussian Grassmann integration, and as such 
finds applications for example in quantum chaos [7] or lattice quantum field theory [5]. 
The Pfaffian for a 2n x 2n skew-symmetric matrix is defined as 

1 " 

Pf(^) = ^ H sgn(a)[]a,(2,_i),,(2^) (3) 

where S2n is the group of permutations of sets with 2n elements. The Pfaffian of an odd-dimensional matrix is defined 
to be zero, as in this case also det(^) = (det(A) = det(^^) = det(-yl) = (-l)^"-! det(A)). While Eq. ^ can be 
used to compute the Pfaffian directly for small matrices, its computational cost 0{n\) is prohibitively expensive for 
larger matrices. 

Analogous to the numeric computation of the determinant, a promising strategy is thus to find a transformation of 
the original matrix into a form that allows an easier evaluation of the Pfaffian. Particularly useful in this context is 
the recursive definition of the Pfaffian, 

2n 

Vi{A) = ^(-l)'anPf (^n) , (4) 

where An is the matrix A without the rows and columns 1 and i. (Note that the Pfaffian of a x matrix is defined 
as 1). Further, for an arbitrary 2n x 2n real or complex matrix B, 



Vi{BAB'^) = det(B)Pf (^) . 



(5) 
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From the recursive definition of the PfafHan Q it is obvious that the PfafHan of a 2n x 2n skew-symmetric tridiagonal 
matrix 
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IS given as 
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Furthermore, a closer inspection of Eq. Q shows that also a matrix that has only a partial tridiagonal form with 
tij = tji = only for odd i and j > i + 1 (i.e. a matrix that would be tridiagonal, if every even row and column 



would be removed). 
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allows for an easy evaluation of the PfafRan, as Pf(r) = Pf(r). 
matrix A a suitable transformation B such that 

A = BTB^ 



Our goal is therefore to find for a skew-symmetric 



(9) 



with T tridiagonal or tridiagonal in every odd row and column. 

It has been known for a while that the PfafRan of a skew-symmetric n x n matrix A can be computed in 0{n'^) 
time, using a skew-symmetric form of Gaussian elimination (adding multiples of rows and columns in a symmetric 
fashion) [U [5141 Ij . Such an skew-symmetric Gaussian elimination computes a factorization of the matrix in the form 
([9]) with B = PL where P is a permutation matrix and L a unit lower triangular matrix. For brevity, we will refer to 
this type of decomposition as LTL^^ decomposition. Gaussian elimination requires pivoting for numerical stability, 
hence the need for the permutation P. Below, we will formulate this approach in a way that allows for an efficient 
computer implementation. 

Another Gaussian based elimination technique is the LDL^ decomposition where A is reduced to D, a matrix with 
only skew-symmetric 2 x 2-blocks on the diagonal [12l US] . This approach has also been suggested for computing the 
PfafRan recently [S] [Hj. We will not persue this approach here, but show that the LTL^ decomposition allows for 
computing the PfafRan in the same number of operations and can be formulated more easily to use level-3 matrix 
operations. 

As an alternative to the Gaussian elimination based techniques, we also develop algorithms using unitary (orthogonal 
in the real case) transformations that are also known to allow for a stable numerical computation in 0{rt') for 
dense matrices. This approach doe not require pivoting for numerical stability and can more easily make use of the 
handedness of a matrix. We will describe how to compute a unitary matrix Q such in order to tridiagonalize (either 
fully or partially) A, 



A = QTQ' 



(10) 



or equivalently T — AQ* ^ where ^ denotes the Hermitian conjugate and * complex conjugation. Note that such a 
unitary congruence transformation is for the complex case quite different from the usual unitary similarity transfor- 
mations usually encounters, which are of the form A = QTQ^ . In the real case, the transformation reduces to the 
usual orthogonal similarity transformation. 
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B. Tridiagonalization and the canonical form of skew-symmetric matrices 



Apart from computing allowing for an efficient computation of the Pfaffian, the tridiagonal form of a skew-symmetric 
matrix under unitary congruence is also relevant for computing the canonical form of this matrix. 

A skew-symmetric matrix has a particularly simple canonical form under a unitary congruence transformation. For 
every skew-symmetric matrix A there exists a unitary matrix U such that |151 116j 

A^UEU'^, where S = Si ® S2 ® ••• ® Sfe ®0® ••• ®0 (11) 

where rank(A) 2k, ® denotes the direct sum, and 











> 0. 



(12) 



This canonical form has been used in the physics context for example to prove the Kramer's degeneracy of transmission 
eigenvalues [T7] and the degeneracy of Andreev reflection eigenvalues |18) . 

The problem of computing the canonical form of an even-dimensional skew-symmetric tridiagonal matrix has been 
discussed in |19H21j , the reduction of the problem with on odd-dimensional matrix to the even-dimensional case in 
[TO] , For a 2n X 2n skew-symmetric tridiagonal matrix as defined Eq. ([6]), the values of (7^, i — l..k are given by the 
k non-zero singular values of the bidiagonal matrix 
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For completeness, we give details and a self-contained derivation in appendix [K\ 

The canonical form of a skew-symmetric matrix under unitary congruence is also connected to certain eigenvalue 
problems: In the real case, the eigenvalues of A are given by ±i(Tj. In the complex case, the matrix A* A = —A^'^A 
has doubly degenerate eigenvalues a^. 



C. Skew-symmetric tridiagonalization and existing approaches 

Both the computation of the Pfaffian and of the canonical form are ultimately connected to the problem of tridiag- 
onalizing a skew-symmetric matrix. Here we give an overview of existing solutions (with implementations) that could 
be used to solve parts of the problem, and discuss the need for a new comprehensive implementation. 

For real skew-symmetric matrices, the unitary congruence transformation reduces to an ordinary orthogonal sim- 
ilarity transformation and hence established decompositions can be used [20]: The Hessenberg decomposition of a 
skew-symmetric matrix reduces to tridiagonal form ([6]), and the real Schur decomposition to the canonical form (11) 



(implemented, for example in LAPACK ^22^). However, none of these decompositions make use of the structure of 
the problem which would be desirable for precision and speed, nor can they be used for complex skew-symmetric 
matrices. 

Ward and Gray have developed and implemented algorithms to compute the tridiagonal form and the eigenvalues 
(and as an intermediate step, the canonical form) of a real dense, skew-symmetric matrix, making use of the structure 
of the problem [23]. A complex version is however not available. 

The accompanying Matlab code [21] to [13] contains a skew-symmetric LDL^ decomposition that can be used to 
compute Pfaffians, but according to the authors is not designed for efficiency. 

Very recently, Gonzalez-Ballestero, Robledo and Bertsch have developed a library for the numerical computation of 
the Pfaffian of a dense skew-symmetric matrix [25], but do not give access to the transformation matrix (e.g. needed 
for computing the canonical form). They present algorithms based on a LDL^ decomposition (called Aitken block 
diagonalization in (25j) and on Householder tridiagonalization. However, their approach does not make use of the full 
symmetry of the problem. 

None of the existing approaches (with the exception of LAPACK that does not exploit the skew-symmetry of the 
problem) makes use of block algorithms that are rich in level-3 operations and desirable for a more favorable memory 
access pattern. Below we will show that such block algorithms can give rise to a considerable increase in speed. 

Moreover, none of the above approaches makes use of the sparsity of a banded matrix, a structure that however 
often arises in practice. Below we will also consider this case in particular. 

The goal of this work is thus to develop and implement algorithms for tridiagonalizing a real or complex skew- 
symmetric matrix, making use of the skew-symmetry and possibly the handedness of the matrix. 
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D. Outline 



The remainder of the paper is organized as foUows. In Sec. |TT] we discuss algorithms to tridiagonaUze a dense or 
banded skew-symmetric matrix using Gauss transformations, Householder reflections and Givens rotations. Further, 
in Sec.|III|we discuss the details of our implementation, and present benchmarks and an exemplary application in Sec. 



IV In the appendix, we give a self-contained derivation on the computation of the canonical form of a tridiagonal. 



skew-symmetric matrix. Moreover, we discuss blocked versions of our tridiagonalization algorithms for dense matrices 
and give technical details about the Fortran implementation. 



II. SKEW-SYMMETRIC NUMERICAL TRIDIAGONALIZATION 



A. Statement of the problem 



Summarizing the discussion above, for a given skew-symmetric nxn matrix A we seek a (invertible) transformation 
B such that A — BTB^ with T in tridiagonal form tridiagonal (or in partial tridiagonal for) . Below we consider first 
an algorithm for dense matrices based on Gauss transformations requiring pivoting. Then we focus on algorithms 
bases on unitary transformations where we consider both dense and banded matrices. The discussion is presented for 
the case of complex matrices, but it carries over to the real case unchanged. 



B. LTL^ decomposition of dense matrices using the Parlett-Reid algorithm 

For symmetric or Hermitian matrices there exist efficient algorithms to compute a LTL^ or LDL^ decomposition 
(for an overview, see [20]). It has been shown by Bunch that those decompositions can in principle also be generalized 
and computed stably for skew-symmetric matrices [T^]. Below we reformulate the algorithm for the LTL^ decom- 
position of a symmetric matrix due to Parlett and Reid '26J such that it is suitable for skew-symmetric matrices. 
The Parlett-Reid algorithm is usually not the method of choice in the symmetric case, as there are more efficient 
alternatives [571 HHj- However, as we will discuss below, the Parlett-Reid algorithm can be used to compute the 
Pfajjian just as effective. 

A n X n matrix of the form 

Mk = E„-aki4"^f (14) 

(n) 

where £"„ is the nxn identity matrix and e). the fc-th unit vector in C", is called a Gauss transformation if the 
first k entries of are zero. Given a vector x — {xi . . . x„)"^ and taking otf^ — (0 ... Xk+i/xk ■ ■ ■ Xn/xkf" , M/. can 
be used to eliminate the entries fc -I- 1 . . . ti in x, A/^x — {xi . . . Xk . . .0), provided that Xk ^ 0. 

A Gauss transformation can thus be used to zero the entries in a column or row of A below a chosen point k. In 
order to avoid divisions by a small number or zero, a permutation Pk interchanging entry k with another nonzero, 
typically the largest entry in A: + 1 ... n is performed. The numerical stability of this pivoting strategy is discussed in 

m- 

Hence, a series of Gauss transformations and permutations can be used to tridiagonalize a skew-symmetric matrix 
A. To demonstrate the mechanism, assume that after applying fc — 1 Gauss transformations and permutations, the 
matrix A^''~^'> = M]^P^. . . . M2P2 A P2 Mj . . . Pj^M^ is already in tridiagonal form in the first fc — 1 columns and rows 
and hence has the form 

%1 ^12 

^(^=-1) = I A21 A23 I 1 (15) 
A32 A33 

with All e c'^-ix'^-i, A21 e C^'"'-\ A32 e C"-'=^\ ^33 e C"-'=^"-^ and A12 = -A^i, A23 = ~Aj2 (transforma- 
tions of the form BAB^ maintain skew-symmetry). Now choose a permutation matrix Pk+i such that the maximal en- 
try in A32 = (ttfe+i . . . a„)^ is permuted to the top, i.e. Fa;+i^32 = (a*:+i ■ • • On)^ where |afc+i| = max(|afe+i| , . . . |a„|). 
If the maximal element at this step is zero, A32 — and A'^''^^^ is already tridiagonal in the first fc columns 
and we set M^+i = Pk+i = En- Otherwise, we take Pk+i — dia.g{Ek, Pk+i) and Mk+i — diag(i?fe, M^+i) with 
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FIG. 1: Example of the structure induced by applying a Gauss or Householder transformation from left and right, x represents 
nonzero entries and □ those entries that are zeroed by the Gauss or Householder transformation. In this example the first 
column and row have already been reduced, and the transformation is applied to A^^^ from the left and the right. The parts 
of the matrix that are changed in the process are marked with a frame. 



Mk+i = En^k - afe+i(ej" ''■')^ with otk+i = (0 ak+2/ak+i ■ ■ ■ a.n/ak+iY ■ Then we obtain 

Mil A12 \ 

AC^) = Mk+,Pk+iA^^-'^Pl+,Ml^, = U21 A23Pf+iMj+i . (16) 

V Mk+iPk+iA^2 Affe+iPfe+aagPj+iMj+i/ 

Then Mk+iPk+iA^2 c< e^" '^^ and A^'^'' is tridiagonal in its first k rows and columns. Defining w — Pk+iA'^^P'^j^^e)^ 
we find 

Mk+iPk+iA^zPl+iMl^i = Pfe+iAgaPj+i + afe+iw^ - wa^+i . (17) 

The cross-term (eJ"~''-')-^Pfc+iA33P^-^e^"~*^-' vanishes due to the skew-symmetry of ^33. Note that in this skew- 
symmetric outer product update, the matrix Pk+iA^'iP^j^^ remains actually unchanged in the first column and row 
due to the structure of O-k+i and w. The outer product update is dominating the computational cost of each step 
and can be computed in 2(n — fc)^ flops, if the symmetry is fully accounted for. Fig. [l] shows the structure of a 
tridiagonalization step schematically for a particular example. 
After n — 2 steps, the decomposition can be written as 

PAP'^ = LTL^ (18) 

with permutation P — P„_i . . . P2, skew-symmetric tridiagonal T = M„_iP„_i . . . M2P2 AP2M2 ■ ■ ■ Pjf_iMj_2, and 
lower unit triangular matrix 

L = (Mn-iPn-i . . . MaPaP^)-^ . (19) 

As in the symmetric Parlett-Reid algorithm, the first column of L is e^""*, and the k-th column below the diagonal a 
permuted version of ccfc. 

The computation of the updated matrix, Eq. (17), is a level-2 matrix operation. It is possible to regroup these 
updates in a way that allows to operate with level-3 matrix operations that have a more favorable memory access 
pattern. The details of this block version of the Parlett-Reid algorithm are given in appendix [B] 

The full skew-symmetric LTL"^ decomposition can be computed in 2n^/3 fiops. It is however readily generalized 
to compute only a partial tridiagonal form as in Eq. ([s]) by skipping every other row and column elimination. This 
partial LTL"^ decomposition can thus be computed in /3 flops. Since det(P) — 1 and det(P) can be computed in 
n steps, computing the Pfaffian of a skew-symmetric matrix with the Parlett-Reid algorithm thus requires a total of 
flops. This is a factor of 10 less than the unsymmetric Hessenberg decomposition. 

For computing a full tridiagonalization, the Parlett-Reid algorithm requires twice as many fiops as other approaches: 
Aasen's algorithm [27J computes a (complete) LTL^ decomposition using a different order of operations in n^/3 flops, 
as does the Bunch-Kaufmann algorithm [28 for computing a (complete) LDL^ decomposition. Both algorithms can 
be generalized to the skew-symmetric case. Given the fact that computing the Pfaffian requires less information 
than a full tridiagonalization, it might seem feasible to compute the Pfaffian in /6 flops. However, neither Aasen's 
algorithm (which is based on the fact that TL^ is upper Hessenberg and hence T fully tridiagonal), nor the Bunch- 
Kaufmann algorithm (which relies on the block-diagonal structure of D) are easily amended to compute a suitable 
partial factorization. Thus, for computing the Pfaffian, the Parlett-Reid algorithm is competitive. It remains an open 
question if it is possible to compute the Pfaffian of a dense skew-symmetric matrix in less than flops. 
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C. Tridiagonalization of dense matrices with Householder reflections 

Dense symmetric or Hermitian matrices are commonly reduced to tridiagonal form by Householder transformations 
[20] . and we adopt this approach to the skew-symmetric case here. 
An order m Householder transformation H is a matrix of the form 



H = 1 — rvv 



where t G C and v G C" chosen such that 



(m) 



for a given x G C™. Here || • ||2 denotes the norm of a vector, e^™-* is the first unit vector in C™ and a G C. For 
example, a — — e**^, if one chooses v = x + e"^||a;||2e^™'' when xi = e^'^\xi\, but there is a certain degree of freedom 
in choosing the Householder vector v which can be exploited to maximize stability (for an overview, see [29] ) . Note 
that H is unitary (though not necessarily Hermitian) and can also be numerically calculated such that it is unitary 
up to machine precision pO] , 

Householder transformations can thus be applied to a matrix to zero all the elements of a column (or row) below 
a chosen point, just as Gauss transformations, but without the need for pivoting. As a consequence, the structure of 
the tridiagonalization procedure is analogous to the LTL^ decomposition. Assume that after step k — 1 the matrix 
. . . Hi AH J . . . H^_^ is already tridiagonal in the first k — 1 columns and rows and partitioned as defined in Eq. 

( 15 ). Then an order n — k Householder matrix Hk is chosen such that HkA^2 oc e^" ^'^ and the full transformation is 



set to Hk — diag(i?fc, Hk)- Writing Hk — 1 — rvv^ and defining w = tA^^v* we find 

HkAs.zHl = v433 + vw^ - wv"^ . (20) 

The main difference to the LTL"^ decomposition is the fact that the computation of w now involves a full matrix- vector 
multiplication. Hence, the total computational cost of the outer product update in Eq. pO| is 4(n — fc)^ flops. The 
structure of a Householder tridiagonalization step is also shown schematically in Fig. [T] The outer product updates of 



Eq. (20) can be rearranged to increase the fraction of level-3 matrix operations. The block version of the Householder 
algorithm is detailed in App. [B} 

Complete tridiagonalization with Householder matrices requires in total 4n'^/3 flops. This can reduced to 2n'^/3 for 
computing the Pfaffian by skipping every other row/column elimination to compute only a partial tridiagonal form. 

For the computation of the Pfaffian we also need to compute the determinant of the transformation matrix Q = 
hIhI . . . H\_2- The determinant of a single Householder transformation = 1 — r*vv^ is given as 

det(i/^) = 1 - r*vV. (21) 

For the particular choice r — 2/v^^v, dei{H) — det(iJ^) = —1, i.e. P is a refiection, but other choices of r are equally 
viable. In particular, if the matrix is already tridiagonal in certain column and row (which can happen frequently for 
very structured matrices), it is advantageous to use H = E^- Moreover, any complex skew-symmetric matrix may 
be reduced to a purely real tridiagonal matrix using appropriate Householder transformations with complex t |29j . 
Because of this, the determinant of each Householder refiection must be computed separately. The task of computing 
det((5) still only scales as cx and is thus negligible compared to the tridiagonalization cost. 

In summary, for computing the Pfaffian Householder tridiagonalization is twice as costly as the Parlett-Reid al- 
gorithm and thus usually not competitive. It has however a right on its own given its connection to computing the 
canonical form of a skew-symmetric matrix. 



D. Tridiagonalization of band matrices with Givens rotations 



The dense algorithms of the previous two sections are not easily amended to matrices with a finite band width. 
In the case of the Parlett-Reid algorithm, the symmetric pivoting can lead to an uncontrolled growth of the band 
width depending on the details of the matrix. In the Householder tridiagonalization, the outer product matrix update 
always introduces values outside the band, leading to a fast-growing band width. 

For symmetric matrices, LTLF or LDLF decomposition algorithms respecting the band width have only been 
introduced recently [301 131j . In contrast, banded tridiagonalization with unitary transformations is well established 
for symmetric matrices, and we adopt this approach for the skew-symmetric case below. 
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FIG. 2: Example for the structure induced by applying a Givens rotation G2,3 to a skew-symmetric, banded matrix from the 
left and right: x denotes nonzero entries, □ the entry that is eliminated by the Givens transformation, and Kl the entries that 
are introduced outside band (fill-in). The Givens rotation only affect the second and third row and column (marked by frames). 



Instead of zeroing a whole column or row as is done in the Householder approach, for banded matrices it is of 
advantage to use a more selective approach. The method of choice for this case in the symmetric or Hermitian case 
are Givens rotations [32], and we will extend this approach to th skew-symmetric case. A Givens rotation Gi,j is a 
modification of the identity matrix that is only different in the ith and jth row and column. It is defined as 

/l ... ••• ••• 0\ 



•■ 



•■ 







Vo 











(22) 



1/ 



with c € R, s G C and 



|sp = 1 and thus clearly unitary. Choosing c and s such that 



c 

-s* 



(23) 



it is possible to selectively zero one element of a vector. Again, a Givens rotation can be computed numerically such 
that it is orthogonal up to machine precision. 

A banded skew-symmetric matrix can be brought into tridiagonal form by Givens rotations of the form Gi^i+i. The 
structure induced in the process of applying Gi_i+i from the left and right is shown schematically in fig. [21 Applying 
a Givens rotation Gi^i+i (Gf^^i) from the left (right) only modifies the ith and {i + l)th rows (columns)!Due to the 
skew-symmetry, if Gj^i+i zeroes the (z -I- 1) entry in column j , Gf^^^i zeroes the {i + 1) entry in row j. Furthermore, 
each Givens rotation only introduces at most one additional nonzero entry outside the band in a row and column 
k > i + 1. This nonzero entry can thus be moved further down the band by a sequence of Givens transformations 
until it is "chased" beyond the end of the matrix. 

The structure of the skew-symmetric tridiagonalization routine is thus identical to the symmetric or Hermitian 
case. The main difference is in the update of the diagonal 2 x 2-block that is affected by both Givens rotations from 
left and right: Due to the skew-symmetry, the diagonal blocks are invariant under these transformations, 



-s* c [-a [s c 



a 

~a 



(24) 



Kaufman [331 134| has presented a variant of the symmetric band matrix approach of Ref. ^32,^ that allows to work on 
more data in a single operation, which allows a more favorable memory access pattern. These modifications carry 
over unchanged to the skew-symmetric case. 

The tridiagonalization of an n x n skew-symmetric matrix with bandwidth h using Givens transformations scales 
as 0{bn'^). 

The determinant of any single Givens rotation det(Gij) — 1, and thus the determinant of the full transformation 
det((5) = 1, too. In the complex case the resulting tridiagonal matrix can be chosen to be purely real, in this case 
the determinant of total unitary transformations (the Givens transformations and row/columns-scalings with a phase 
factor) obey |det((5)| = 1. 
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SKTRF 


Skew-symmetric tridiagonal decomposition of a dense matrix using the block 
Parlett-Reid algorithm. 


SKTF2 


Skew-symmetric tridiagonal decomposition of a dense matrix using the Parlett-Reid 
algorithm (unblocked version). 


SKTRD 


Skew-symmetric tridiagonalization of a dense matrix using block Householder 
reflections. 


SKTD2 


Skew-symmetric tridiagonalization of a dense matrix using Householder reflections 
(unblocked version). 


bKPFA 


Compute the Pfafhan of a dense skew-symmetric matrix (makes use of either SKTRD 

or bKlHr j. 


SKPFIO 


Compute the Pfafhan of a dense skew-symmetric matrix (makes use of either SKTRD 
or SKTRF). The result is returned as a x 10^ to avoid over- or underflow. 


SKBTRD 


Skew-symmetric tridiagonalization of a banded matrix using Givcns rotations. 


SKBPFA 


Compute the Pfafhan of a banded skew-symmetric matrix (makes use of SKBTRD). 


SKPFIO 


Compute the Pfafhan of a banded skew-symmetric matrix (makes use of 
SKBTRD). The result is returned as a x lO'' to avoid over- or underflow. 



TABLE L Overview of the computational routines in the Fortran implementation. In the Fortran?? interface the routine 
name must be preceded by either S (single precision), D (double precision), C (complex single precision), or Z (complex double 
precision) . 

III. NOTES ON THE IMPLEMENTATION 
A. Fortran 

We have implemented the algorithms described in this manuscript as a comprehensive set of Fortran routines for 
real and complex variables as well as single and double precision. Because of the conceptional similarity of the 
skew-symmetric problem to the symmetric and Hermitian problem, these routines are designed analogous to to the 
corresponding symmetric and Hermitian counterparts in LAPACK. Moreover, our implementation also makes use 
of the LAPACK framework for computing, applying, and accumulating Householder and Givens transformations, 
which was designed for numerical stability and which is available in an optimized form for any common computer 
architecture. 

Dense skew-symmetric matrices are stored as ordinary two-dimensional Fortran matrices, but only the strictly lower 
or upper triangle needs to be set (for differences in the implementation between lower and upper triangular storage 
see App. [C]). For banded skew-symmetric matrices, only the strictly upper or lower diagonals are stored in a K x N 
array AB, where K is the number of non-zero off-diagonals and N the size of the matrix. The j-th column of the matrix 
A is stored in the j-th column of AB as 

• AB(K + l + i — = Aij for max(l, j — K) <= i <= j, if the upper triangle is stored, 

• AB(1 + i — j, j) = Ai,j for j <= i <= min(N, j + kd), if the lower triangle is stored. 

Note that in this scheme, also the zero diagonal is explicitly stored. This was done to keep the design identical to the 
storage scheme of symmetric and Hermitian band matrices in LAPACK. 

Our library includes stand-alone routines for the tridiagonalization of a skew-symmetric dense matrix (SKTRF and 
SKTF2 using the Parlett-Reid algorithm, SKTRD and SKTD2 using the Householder approach) and banded matrices 
(SKBTRD). We also include functions to compute the Pfafhan of a skew-symmetric dense (SKPFA and SKPFIO) and 
banded matrices (SKBPFA and SKBPFIO), which are based on the tridiagonalization functions. As the determinant, the 
Pfafhan of a large matrix is prone to floating point over- or underflow. Because of that, we have included routines 
that return the Pfafhan in the form a x 10^, where a is real or complex, and b is always real and integer (SKPFAIO and 
SKBPFIO). Both a Fortran95 and a Fortran?? interface are provided. In the Fortran?? version of the code the routine 
name is preceded by either S (single precision), D (double precision), C (complex single precision), or Z (complex 
double precision) . The computational routines and their purpose are summarized in Table |l] 

The block versions of the algorithm have an internal parameter controlling the block size. By default, the routines 
use the same block sizes as their symmetric counterpart from the LAPACK library. However, this internal parameter 
may be changed by the user to optimize for a specific architecture. 

Apart from the documentation here, all routines (including the auxiliary ones) are documented extensively in the 
respective files. Due to our routines using LAPACK and BLAS, both libraries must be also linked. 
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Block 
Parlett- 
Reid 


Regular 
Parlett- 
Reid 


Block 
House- 
holder 


Regular 
House- 
holder 


Givens for 
band 
matrix 


DGEHRD 

(Lapack) 
[22] 


TRIZD 
from [23] 


PfaffianH 
from [25] 


Pfaff ianF 
from [25] 


(a) benchmark for AMD Opteron 6174 2.2 Ghz 


real 3000 x 3000, 
dense 


5.1 


5.9 


9.4 


10.5 




24.7 


383.4 


54.5 


120.8 


real 3000 x 3000, 
banded, k = 100 


0.9 


0.7 


9.3 


10.5 


2.1 


24.8 


383.2 


54.3 


121.7 


complex 2000 x 2000, 
dense 


3.5 


4.4 


7.6 


8.2 








32.2 


50.1 


complex 2000 x 2000, 
banded k = 100 


0.8 


0.7 


7.6 


8.1 


2.0 






31.9 


50.2 


(b) benchmark for Intel Core 2 Duo E8135 2.66 Ghz 


real 3000 x 3000, 
dense 


3.5 


8.3 


8.4 


12.4 




30.7 


105.3 


76.3 


49.4 


real 3000 x 3000, 
banded, k = 100 


0.7 


0.5 


8.4 


12.2 


1.4 


30.4 


105.4 


75.9 


48.5 


complex 2000 x 2000, 
dense 


3.5 


5.0 


7.5 


8.3 








48.3 


28.3 


complex 2000 x 2000, 
banded, k = 100 


0.8 


0.7 


7.5 


8.2 


3.0 






49.3 


26.3 



TABLE 11: Benchmark comparison of the implementation of this work and other numerical approaches to compute the Pfaffian 
of a skew-symmetric matrix. The table shows the time needed to compute the Pfaffian for the various methods (time given in 
seconds) on two different architectures [(a) and (b)]. The first five columns of benchmark results correspond to the methods 
discussed in this work. For the banded matrices, k denotes the strictly upper or lower bandwidth, the full bandwidth is hence 
2k + 1. 



B. Python, Matlab and Mathematica 



While most compiled languages (including C and C++) are easily interfaced with a Fortran library, interpreted 
languages such as Python or programs such as Matlab or Mathematica require somewhat more effort. For this reason 
we have included stand-alone versions of the tridiagonalization of dense skew-symmetric matrices using Householder 
reflections implemented in Python, Matlab and Mathematica. Those implementations, being of course slower than 
the Fortran counterpart, are useful especially for situations where speed is not critical. Both implementations also 
make use of the fact that for computing the Pfaffian, only the odd rows and columns need to be tridiagonalized, but 
always work on the full matrix instead of a single triangle. 

Again, more extensive documentation for both implementations may be found in the respective files. 



IV. EXAMPLES 



A. Benchmarks 



To demonstrate the effectiveness of our methods, we have performed benchmark computations of the Pfaffian of 
large, random matrices on various architectures. In Table [IT] we compare our approach with the other available 

For this benchmark we 
using the same compiler 



software that can also be used to calculate the Pfaffian in certain situations (see Sec. IC 



have compiled our Fortran implementation, and the implementations of Refs. | 23j and |25 
and compilation options, and chose a machine-optimized version of the LAPACK library [52]. 

From the benchmark results we can see that the block approach is always faster than the unblocked version. The 
relative speed-up depends strongly on the architecture, but can reach up to 60%. We also observe that the relative 
speed-up of the Parlett-Reid algorithm is larger than of the Householder tridiagonalization, reflecting the fact that 
the level-3 content of the former is larger (see App. [b|). 

For the banded random matrices we observe that the Parlett-Reid algorithm performs surprisingly well. Although it 
is not designed to make use of the handedness of the matrix, the implementation of the skew-symmetric outer product 
update takes into account zeros in the vectors of the update. The Householder tridiagonalization does not benefit as 
much, as for the matrices here the band width growth in the Householder approach is faster than in the Parlett-Reid 
algorithm. It should be stressed however that the performance of these algorithms in the banded case depends on the 
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actual values of the matrix. For example, band width growth is stronger in the Parlett-Reid algorithm, if the largest 
entries sit at the edge of the band. 

The specialized approach for banded matrices using Givens transformations is still slightly slower than the Parlett- 
Reid algorithm for the matrix sizes considered here. The main benefit of the specialized algorithm for PfafRan 
calculations is hence its much lower memory requirement. In fact, typically memory limits the matrix sizes that can 
be handled, not computational time. The banded Givens-based approach however is considerably faster than the 
Householder tridiagonalization which makes it very attractive for computing the canonical form (or eigenvalues in the 
real case). 

Comparing to other approaches, we observe that our routines are always faster, typically by a factor of about 10 
or more. In particular in the real case, our specialized approach is considerably faster than using the real Hessenberg 
reduction, although we do not always reach the full speed-up of a factor of 10 that can be expected from the operation 
count, which is due to the optimization of the LAPACK library used. The implementation for real matrices of Ref. 
|23j is particularly slow as its memory access pattern is somewhat unfavorable for modern computer architectures. 



B. Application: topological charge of a disordered nanowire 



Finally, we apply our approach to computing the Pfaffian to a physical example, namely the numerical computation 
of the topological charge of a disordered nanowire. 

A nanowire made out of a topological superconductor supports at its two ends Andreev bound states pinned at 
the Fermi energy [Tl [551 - 155] . Because of particle-hole symmetry, those states are Majorana fermions - particles that 
are their own anti-particle - and may allow for topologically protected quantum computing [1,. In contrast, an 
ordinary (trivial) superconducting wire does not support such states. The recent proposal to realize a topological 
superconductor using ordinary semiconducting and superconducting materials |36H38j has stirred a lot of interest 
towards Majorana physics in condensed matter. 

A topological charge Q is a quantity that indicates whether a system is in the trivial or topological state, and 
hence signifies the absence or existence of Majorana bound states. A superconducting system exhibits particle-hole 
symmetry which allows the Hamiltonian to be written in the form p] 



hi 



where A is a skew-symmetric matrix, and q, Cj Majorana operators with — Ci, = 1 and CiCj + CjCi = 6ij. 
Below we further specialize to the case where particle-hole symmetry is the only remaining symmetry, i.e. broken 
time-reversal and spin-rotation symmetry, which puts the system into class D of the general symmetry classification 
scheme [39] . 

Kitaev has shown that for a translationally invariant wire, 

Q = sign Pf [A(0)]Pf [A(^/a„e)] (26) 

is a topological charge that signifies the absence {Q — 1) or existence {Q — —1) of Majorana bound states at the 
ends. In this expression, A{k) is the Hamiltonian in (Bloch) momentum space written in the Majorana basis. A{k) 
is a matrix with a size corresponding to the size of the unit cell, the unit cell length is denoted by a^c- Note that the 
PfafRan needs only be evaluated at two values of momenta which correspond to closing the unit cell with periodic 
(k ~ 0) and anti-periodic (fc = i: /a-yj^^) boundary conditions. 



For a clean system, the size of A{k) is cx W , where W is the width of the wire, and Eq. ( 26 ) has been used previously 



to compute the topological charge [T1|3S]. A disordered system can be considered (up to finite size effects, see below) 



in Eq. (26) as a large, disordered supercell repeated periodically. In this case, the size of A{k) is oc WL^ where L 



is the length of the supercell. This implies that A{k) in a disordered system will be a very large matrix, and we 



are not aware of any application of Eq. (26) for such as system. However, the sparse structure of A, in particular 



its handedness, allows the application of the special algorithms developed in this work, and allows for the first time 



applying (26) to large, disordered systems. 



Recently, an alternative definition of the topological charge for class D systems has been shown [30] . In contrast to 



Eq. (26) which is based on properties of the Hamiltonian, this alternative definition is based on transport properties: 

Q==signdetr, (27) 



where r is the reflection matrix. This definition is equally applicable to clean and disordered systems. Below we show 
numerically the equivalence of the definitions (26) and (27). 
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FIG. 3: Topological charge of a semiconductor nanowire in proximity to a superconductor for vario us d isorder strengths as a 
function of the Fermi energy, computed from the Pfaffian of the Hamiltonian in the Majorana basis ( 26 1 (solid lines) and from 
the determinant of the reflection matrix (27 1 (dashed lines). Parameters of the calculation were W = Iso, L = lOZso, a = Zso/20, 
A = lOSso, and QcftfiBB — 21. 



For this we use the model of Refs. [33 |SZ| with a normal state Rashba Hamiltonian 



Hn = 



2m, 



off 



U{r) 



(28) 



where rrioff is the effective mass of the two-dimensional electron gas, a the Rashba spin-orbit coupling, and gcS^J■BB the 
Zeeman splitting due to an external magnetic field. Characteristic length and energy scales are /go = ^^/m-offaso and 
^'so = Weffag^/S^. The electrons are then confined into a nanowire of width W and length L in the x — y-plane. For 
the numerical treatment, the Hamiltonian is discretized on a square grid with lattice constant a and thus represented 
by a matrix Hij^^^, where i, j denote lattice sites, and /i, v the spin degrees of freedom. Disorder is introduced as a 



random on-site potential taken from the uniform distribution [—Uq,Uo]. 
with a s-wave superconductor then reads 



The Hamiltonian of the system in contact 



H 



(29) 



where A is the proximity-induced pair potential. Defining Majorana operators as C2i-i,p = -^{o-i,/^ + ^1^) £^nd 
C2i,fi — -75(01. /J ^ '^i^) can transform Eq. (29) into the form (25) with a skew-symmetric matrix A (whose 



bandwidth is reduced for the numerics with a bandwidth reduction algorithm j41j). 

In Fig. [3] we show the numerical results for the topological charge as defined in Eqs. (26) and (27). For the 
computation of the Pfaffian in (26) we apply the Givens based method from Sec. II D for computing the reflection 



matrix the numerical method of Ref . [52] . 

In all cases, clean and disordered, both definitions of Q agree very well. In particular, both definitions predict a 
vanishing of the topological phase for the largest disorder in the region 10 < E-p/E^o < 25. There are only very small 
differences in the exact value of Ep where the topological charges change sign. These differences can be explained by 
finite size effects: At these points the bulk of the nanowire has a significant conductance (| det r| ^ 1) which in turn 
means that the different geometry of Eqs. (26) (infinite repetition of a supercell) and (27) (single supercell connected 
to metallic leads) matter. In contrast, in the regime where the bulk of the wire is fully insulating (| detrj w 1), both 
definitions agree fully. 



The algorithmic developments in this work have allowed to evaluate Eq. ( 26 ) for a fairly large disordered system 



The bandwidth of the respective skew-symmetric matrix A scales cx W, and hence the cost of tridiagonalization as 
oc W^L^. In contrast, the definition of the topological charge from transport properties (27) scales oc W^L [32]. Hence, 
from a computational viewpoint, Eq. (27) is more favorable. It is thus reassuring that our numerical experiments 



showed the equivalence of both definitions. 



V. CONCLUSIONS 



We have shown that both the computation of the Pfaffian and the canonical form of a skew-symmetric matrix can be 
solved easily once the matrix is reduced to skew-symmetric (partial) tridiagonal form. To find this form, we have then 
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developed tridiagonalization algorithms based on Gauss transformations, using a skew-symmetric, blocked version of 
the Parlett-Reid algorithm, and based on unitary transformations, using block Householder reflections and Givens 
rotations, applicable to dense and skew-symmetric matrices, respectively. These algorithms have been implemented 
in a comprehensive numerical library, and its performance has been proven to be superior to other approaches in 
benchmark calculations. Finally, we have applied our numerical method for computing the Pfaffian to the problem of 
computing the topological charge of a disordered nanowire, showing numerically the equivalence of different methods 
based on the Hamiltonian or the scattering matrix of the system. 
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Appendix A: The computation of the canonical form of a skew-symmetric matrix 



The problem of computing the canonical form of an even-dimensional skew-symmetric matrix has been discussed 
in |19H21j (the odd-dimensional problem can be reduced to an even-dimensional one by a series of Givens rotations 
[TO]). Here we give a self-contained derivation for completeness. 

A 2n X 2n skew-symmetric matrix A can be reduced to tridiagonal form A — QTQ^ with Q unitary (orthogonal in 
the real case) and T tridiagonal as given in Eq. ([6|. This tridiagonal matrix can be rewritten as 



T = P 



-J 



with J as given in Eq. ( 13 1 and P the permutation 

P 



1 2 
1 3 



n n + 1 n 2 
2n - 1 2 4 



2n 
2n 



From the singular value decomposition (SVD) of J = ySVF, with E = diag((Ti, 
and U, V unitary (orthogonal in the real case) matrices, we then obtain 



T = P 



w 



p' 



(Al) 
(A2) 

,0), ai > 0, k — rank(J), 
(A3) 



But since P^P = 1 and P 



with 



P^ = S with S defined in Eq. (11 1, we find the canonical form of A as 



A = UEU^ 



U^QP 



V 



p' 



(A4) 



(A5) 



In practice, it suffices to implement the SVD for real J, as any complex matrix can be reduced to real tridiagonal 
form using unitary transformations. For the computation of the SVD one can make use of the computational routines 
for bidiagonal matrices from LAPACK [22] . 



Appendix B: Block versions of the Parlett-Reid and Householder tridiagonalization algorithms 



The application of the Gauss and Householder transformations in the update operations Eqs. (17 1 and (20 1 are 
inherently level-2 matrix operations. It is however possible to accumulate transformations and apply those simul- 
taneously in a block representations [20] which has a higher level-3 fraction. This procedure is also used for the 
tridiagonalization of symmetric matrices ^43] and we describe its application to the skew-symmetric case below. 
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Both algorithms are based on transformations of the form 

En - VfcU^ , (Bl) 

and the update operation is of the form 

AW^^C^-D+v.wI^-w.v^ (B2) 

with Wfc — A^'^^^^Uk- 

Assume that the matrix after the fc-th update is given as 

A^''^ =A + VkW^ - WkV^ (B3) 

where Vfc and Wk are n x fc-matrices. For fc = 1, Vi = vi and Wi = wi. The k + 1-th update can then be written as 

= A + YuWl - WkYi + vfc+iwi'+i - wfc+ivi^+i (B4) 
= A + Vfc+iW^J+i - W^J+i^fe+i , 

where Vfc+i = (T4,Vfe+i) and M^fc+i = (VFfe,Wfc+i), and 

Wfc I 1 = A^'^^Ufc I 1 

^ (B5) 
= Aufe+i + VfeVFfe Ufe+i - WfcVfe Ufc+i 

can also be computed without forming A^^^ explicitly. 

Of course, while it may not be necessary to compute the full A^^^ explicitly, the determination of the vectors 
v/j+i and w.k+\ requires the knowledge of the fc + 1-th row or column of J^^'^ . In practice, the matrix A is therefore 
partitioned into panels of r rows and columns, r successive Householder reflections are then computed and applied 
one by one to the r rows and columns in the current panel, but not to the remaining matrix. Instead, they are 
accumulated as described above and the remaining part of the matrix is updated in one block update of the form 



(B3| which is rich in level-3 matrix operations. 

In the case of the Parlett-Reid algorithm, is a unit vector and hence the computation of w/j has little cost. 
In this case the computational cost is dominated by the outer product update and hence the block version consists 
almost entirely of level-3 matrix operations. This is not the case for the Householder tridiagonalization where the 
matrix-vector multiplication to compute remains inherently a level-2 matrix operation. 



Appendix C: Upper versus lower triangle storage in the Fortran implementation 

In order to make full use of the skew-symmetry of the problem, it is essential that an algorithm only works with 
either the lower or upper triangle of the matrix. This is done in the Fortran implementation. However, in this case 
it is also mandatory to use mainly column instead of row operations, as Fortran matrices are stored contiguously 
in memory column-by-column. For this reason, the Fortran code implements the algorithms described in this paper 
verbatimly only if the lower triangle of the matrix is provided. Below we briefly describe the differences when the 
upper triangle is used. 

Instead of starting the tridiagonalization in the first column of the matrix, the versions using the upper triangle 
start in the last column. 

If a partial tridiagonalization is computed, it is not of the form ([8|, but has — tji = only for i even and 
j < i — 1. This amounts to interchanging rows and columns 1 and n, 2 and n — 1, . . . , n/2 and n/2 -I- 1 in Eq. (|8|. 
However, since the determinant of this permutation is equal to one, the value of the Pfaflaan does not change. 

In the case of the Parlett-Reid algorithm, a UTU'^ decomposition is computed, where U is an upper unit triangular 
matrix. 
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